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ABSTRACT 

We present the largest Wiener reconstruction of the cosmic density field made to date. The 
reconstruction is based on the Sloan Digital Sky Survey data release 6 covering the north- 
ern Galactic cap. We use a novel supersampling algorithm to suppress aliasing effects and a 
Krylov- space inversion method to enable high performance with high resolution. These tech- 
niques are implemented in the ARGO computer code. We reconstruct the field over a 500 Mpc 
cube with Mpc grid-resolution while accounting both for the angular and radial selection func- 
tions of the SDSS, and the shot noise giving an effective resolution of the order of ~10 Mpc. 
In addition, we correct for the redshift distortions in the linear and nonlinear regimes in an 
approximate way. We show that the commonly used method of inverse weighting the galaxies 
by the corresponding selection function heads to excess noise in regions where the density 
of the observed galaxies is small. It is more accurate and conservative to adopt a Bayesian 
framework in which we model the galaxy selection/detection process to be Poisson-binomial. 
This results in heavier smoothing in regions of reduced sampling density. Our results show 
a complex cosmic web structure with huge void regions indicating that the recovered mat- 
ter distribution is highly non-Gaussian. Filamentary structures are clearly visible on scales 
up to ~20 Mpc. We also calculate the statistical distribution of density after smoothing the 
reconstruction with Gaussian kernels of different radii rs and find good agreement with a 
log-normal distribution for 10 Mpc < rs < 30 Mpc. 
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1 INTRODUCTION 

Measuring the Large-Scale Structure (LSS) of the Universe has 
become a major task in cosmology in recent years. The relics of 
the seed fluctuations, originating from the inflationary phase of the 
early Universe, are mainly encoded in the linear regime of the LSS 
in which structure formation has not significantly degraded the pri- 
mordial phase information. In particular there has recently been 
a focus on measuring the baryon acoustic signal imprinted in the 
galaxy distribution which has been suggested as a powerful stan- 
dard ruler for our Universe (see for example Eisenstein 2005). 

Upcoming and ongoing galaxy redshift surveys such as 



DEEP2 or Baryon Oscillation Spectroscopic Survey ( BOSS) will 
cover higher and higher r edshifts (see for example IPavis et aP 
l2005l : ISchlegel et al.l 120070 . They are designed to trace complex 
structures in the Universe and to study the environment of galax- 
ies and their evolution. 

We carry a reconstruction of the density field dealing 
with statistical and systematic errors of the ga laxy distributions 
with the ARGcEl computer code described in iKitaura & EnBlinI 
(2008). ARGO is a high-performance implementation of a three- 
dimensional Wiener-filter, permitting treatments of an inhomoge- 
neous and incomplete window function acting on the galaxy distri- 
bution. It exploits the power of fast Fourier transforms (FFTs) and 
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iterative Krylov- space based inversion schemes for the otherwise 
intractable data inversion step. 

Reconstructions permit us to characterize the large-scale 
structure, helping to deepen our understanding of structure forma- 
tion, to gain insight into the physical processes involved, to con- 
struct signal templates for the detection of weak physical effects. 
These can be used to study the cosmic microwave background 
and to reveal signals ranging fro m the Integ rated Sachs-Wolfe ef- 
fect (see for example Fro mmert et al over the Sunyaev- 
Zel'dovich effect in the diffuse gas, to metal absorption lines. An 
interesting further application would be to constrain the bias be- 
tween luminous and dark matter using reconstructions made by 
ARGO and correlating them with simulations and reconstructions 
of the matter distribution coming from other observables like weak 
lensing, Lyman alpha forest, etc. Topological studies could be 
made from the reconstructed data, leading to a geometrical char- 
acterization of the actual large-scale structure (see for example 
IShet h & Sahni 2005,). It is also interesting to study how the physi- 
cal properties o f galaxies dep end on their l arge-sc a le env ironment 
iLi et al'l (l2006bh : lLee"& Leel mm and . Lee & Lil (l2008h . The re- 
constructed structures of a galaxy catalogue can be traced back 
in ti me with various methods, like those based on the Zel'dovich 

Cm) 

approximation (see for example Nusser & Dekel 1992). 
These early matter density fluctuations can be used as initial con- 
ditions for N-body simulations. The results of such a constrained 
simulation have a wide application in structure formation theory 
(see for example Mathis et al. 2002). A joint estimation of the mat- 
ter field and its power- spectrum would also be a natural next step 
given the technology we d evelop below (for similar work in CMB- 
analysis see, for example, IWandelt et"al]l2004l : Ijewell et aLll2004l : 
lEriksen et all 2007). 

We present the first application of ARGO to observational 
data. In particular we have applied our method to recover the 
galaxy density field based on data from Sample dr6fix of 
the New York University Value Added Catalogue (NYU-VAGC), 
which was const ructed from the Sloan Digital Sky Survey (SPSS; 
York et al.ll200Qh Data Release 6 rDR6: lAdelman-McCarthv et all 
2008"). This leads to the largest Wiener-reconstruction of the 
Large-Scale Structure made so far effectively requiring the 
inversion of a matrix with about 10^ x 10^ entries. The use 
of optimized i terative inversion sche mes within an operator 
formalism (see iKitaura & EnBlinI l2008h together wi th a careful 
treatment of aliasing effects (see Jasche et al. ''2009^ permits us 
to recover the overdensity field on Mpc sca les (for pr evious 
Wiener reconstructions see lFisheretal.f[l993: Hoffman 1 19941 : 
Lahav et al.' ^ 19941: lLaha v"l994': 'Zaroubi et al."i995: Fish er et al 
1995 : Webs ter et all Il997: Zaroubi et al. 1999; Schmold t et al 



19991 : lErdogdu et all [2004 ^2006). Note, that alternative density 



reconstru ction techniques like Voronoi and Delaunay tesselations 
(see e.j 



g. Ilcke & van de Wevgaerd [l991 ; Ebeling & WiedenmannI 
IZaninettil 119951 : "Bernardeau & van de Wevgaerr 



iDoroshkevich et al 
iKim et all 
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1997; 
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van de Wevgaert & Schaad 20011, 
Panko & FlinI l2004l : IZaninettil l2006h 
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Meurs & Wilkinson 
I2OOOI : 

Ramella et all 1200 ll : 
are tuned to optimally 
represent the density field from a geometrical point of view, but 
are not explicit in the statistical assumptions made on the galaxy 
or matter distribution, which is an important aspect of our analysis 
here. 

We investigate in detail the statistical problem of finding an 
expression for a noise covariance which includes the survey angular 



and radial selection functions. The expression we find assumes a 
binomial model for the galaxy selection/detection process. 

We show that including our proposed noise covariance matrix 
in the Wiener-filter leads to a more conservative reconstruction of 
matter structures than using the inverse weighting scheme. We also 
compare the linear WF expression which is derived from a least 
squares approach and the non-linear WF which uses a signal depen - 
dent noise covariance (see appendix A in lKitaura & EnBlin|[2008h . 
The latter shows to be even more conservative than the linear WF 
since it strongly suppresses the cells with higher number counts. 

Due to the fine mesh of the reconstruction (~ 1 Mpc) a treat- 
ment of the redshift distortion in the linear and non-linear regime is 
required. We choose a redshift distortion deconvolution method, as 
presented bv lErdogdu et al.l (|2004|) . which aims to correct in both 
regimes. This treatment only corrects the power and neglects any 
phase information. For this reason, the effective resolution of the 
reconstruction is lower than the resolution of the grid (^-^ 10 Mpc). 

Our paper is structured as follows. We start by describing the 
input galaxy sample of the Sloan Digital Sky Survey (SDSS) Data 
Release 6 (DR6) in section [2l Then we present the methodology 
used to perform an estimation of the matter field (section jSj. In de- 
tail, the galaxy distribution is first transformed into the comoving 
frame (section [3. 1.1 1 ) and then assigned to a gr id using our newly 
developed supersampling method (described in I Jasche et al.ll2009h 
to correct for aliasing effects, ensuring a correct spectral repre- 
sentation of the galaxy distribution even up to the highest modes 
contained in the grid (section [3. 1.21 ). Completeness on the sky and 
radial galaxy selection function are then translated into a three di- 
mensional mask, which will be part of the response operator used 
in the filtering step fsection [3.1.3l ). Then, an observed galaxy over- 
density field is calculated which fulfils the statistical requirements 
we want to impose on the matter field (section 13.21 ). Taking the 
observed galaxy field as the data vector we finally apply a Wiener- 
filtering step with the ARGO computer code fsection [3.3.1l ) followed 
by a deconvolution step, effectively correcting for the redshift dis- 
tortion (section [3. 3. 21 ). Here, we distinguish between a linear WF 
expression which is derived from a least squares approach and a 
non-linear WF which uses a signal-dependent noise covariance. 
Both WF formulations are tested with mock data and quantitatively 
compared to a simple procedure in which the galaxies are inverse 
weighted with the completeness, then gridded and finally smoothed 
to give a matter field estimate. 

We present a reconstruction of the density field for the DR6 
main sample in section (5] First, we analyze the survey sky mask 
(section [5J1 . Results for the Sloan Great Wall are then presented 
in detail. Some other prominent structures, for example, the Coma, 
the Leo, and the Hercules clusters are also discussed (section [5^ 
together with the detection of a large void region (section [53] ). The 
proper implementation of the filter enables us to deal with complex 
masks which include unobserved regions. We demonstrate the im- 
proved detection of overdensity regions close to edges of the mask 
and the prediction of structures in gaps, as demonstrated by com- 
paring with data from the Data Release 7 (DR7) where those gaps 
are filled ( section [5l4l ). In section [531 we analyze the statistical dis- 
tribution of the density field and find good agreement with a log- 
normal distribution for smoothing radii in a Gaussian filter in the 
range 10 Mpc < rs < 30 Mpc. Finally, we make a summary of the 
work, and present our conclusions and future outlook. 
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2 INPUT GALAXY SAMPLE 

In t his study we use data from the sixth data release (DR6; 
lA^lman-McCarthv et al. 2008) of the Sloan Digital Sky Survey 
(SDSS; Yorket al. 2000). The survey contains images of a quar - 



ter of the sky obtained usi ng a drift-scan camera (iGunn et al.l 19981) 
r, bands (iFukugita et aLlll996l : Is ' 



in the u, 



ISmithet all 12002 



llvezic et aDl2004l) , together with spectra of almost a million ob- 
jects obtained with a fibre-fed double spectrograph jGunn et a l] 
(2006h . Both instrume nts were mount ed on a special-purpose 
2.5 meter telescope CGunn et al.l l2006h at Ap ache Point Obser- 
vatory. Th e imaging data are photom etrically (iHogg etal.ll200ll : 
iTucker et al.l2006n and astrometrically dPier et al.l2003 h calibrated, 
and wer e used to select spec troscopic targets for the main galaxy 
sample (iStrauss et al. I l2002h . the luminous red galaxy sample 
teisenstein et al.l 1200 ih . and the quasar sample (iRichards et al.l 
l2002h . Spectroscopic fibres are assigned to the targets using 
an efficient tiling algorithm designed to optimize completeness 
telanton et al. |2003g). The det ails of the survey strategy can 



be found in York et al. (l200Qh and an overview of the data 
ipelines and products is provided in the Early Data Release paper 
Stoughton et al. 2002). More d etails on the photometric pipeline 
can be found in ILuDton et'aP (l200lh and on the spectroscopic 
pipeline in ISubbaRao et all booj) ^ 

We take data from Sample dr6f ix of the New York Uni- 
versity Value Added Catalogue ( NYU-VAGC). Th is is an update of 
the catalogue constructed by Bla nton et al.l (12005 ) and is based on 
the SDSS DR6 data and publicly available selection maskfl Start- 
ing from Sample dr6fix, we construct a magnitude-limited 
sample of galaxies with spectroscopically measured redshifts in 
the range 0.001 < z < 0.4, r-band Petrosian apparent mag- 
nitudes 14.5 < m ^ 17.6, and r-band absolute magnitudes 
— 23 < Mo.i^ < —17. Here m is corrected for Galactic extinc- 
tion, and the apparent magnitude limits are chosen in order to get 
a sample that is uniform and complete over the entire area of the 
survey. The absolute magnitude Mo is co rrected to its 2; = 0.1 
value using the K-correction c ode oflBlanton et al. (2003a) and the 
luminosity evolution model of lBlanton et al. ( 2003bl) . We also re- 
strict ourselves to galaxies located in the main area of the survey 
in the northern Galactic cap, excluding the three survey strips in 
the southern cap, i.e. we include galaxies with right ascension (a) 
and declination (6) in the following ranges: 105° < a < 270° 
and — 5° < ^ < 70°. In addition, we considered only galaxies 
which are inside a comoving cube of side 500 Mpc (with equal side 
lengths: Lx x Ly x Lz), as we describe below. These restrictions 
result in a final sample of 255,818 galaxies. 

In order to correct for incompleteness in our spectroscopic 
sample, we need to have complete knowledge of its selection ef- 
fects. A detailed account of the observational selection effects ac- 
companies the NYU-VAGC release. These include two parts: a 
mask on the sky and a radial selection function along the line-of- 
sight. The mask shows which areas of the sky have been targeted, 
and which have not, either because they are outside the survey 
boundary, because they contain a bright confusing source, or be- 
cause observing conditions were too poor to obtain all the required 
data. The effective area of the survey on the sky defined by this 
mask is 5314 square degrees for the sample we use here. It is di- 
vided into a large number of smaller subareas, called polygons, for 
each of which the NYU-VAGC lists a spectroscopic completeness. 



This is defined as the fraction of the photometrically defined tar- 
get galaxies in the polygon for which usable spectra were obtained. 
The average completeness over our sample galaxies is 0.86. The ra- 
dial selection function gives the fraction of galaxies in the absolute 
magnitude range being considered (—23 < Mo.i^ < —17 in our 
case) that are within the apparent magnitude range of the sample 
(14.5 < m ^ 17.6 in our case) at a given redshift. 

In certain cases we also wor k with a sample of gala xies drawn 
from SDSS data release 7 (DR7;"Abazajian et al.'"2009") for which 
the galaxy positions, redshifts and fluxes are publicly available 
from the SDSS websitjfl but the survey completeness as described 
above was not released at the moment this work started. With this 
sample we apply only a gridding scheme and a subsequent Gaus- 
sian smoothing, without accounting for any selection effects, in or- 
der to qualitatively check for overdense regions present in the gap 
in the SDSS DR6. 



3 METHODOLOGY 

In this section, we describe the main algorithms required to per- 
form a Wiener-filter reconstruction of the matter field as described 
in Kitaura & EnBlin ( 2008) (s ee also the pi oneering works lWieneJ 
1949; Rvbicki & Press 19921 : Izaroubi et al.. J995). We start with 
the preparation of the data followed by a filtering step and a fi- 
nal deconvolution. Detailed descriptions of the methodology used 
for each step are described in the following subsections. 



3.1 Preparation of the data 

Reconstructing a signal like the matter density field from the ob- 
served galaxy sample requires a model which relates the underly- 
ing matter field to the galaxy distribution. This model will define 
the inverse problem, which can be solved with a reconstruction al- 
gorithm. In this subsection, we describe how to prepare the input 
data in such a way that it is consistent with the data model under- 
lying the ARGO-code. 



3.1.1 Transformation of the data into comoving coordinates 

To apply a reconstruction algorithm which uses the correlation 
function in comoving space, we first have to transform the redshift 
distances into comoving distances for each galaxy by performing 
the integ raS 



(1) 



with H{z) being the Hubble parameter given by: 



H{z) = Ho V^m{l + z)^ + Qk(1 + z)^ + Qa, (2) 

where we chose the concordanc e ACDM-cosmolog y with Qm = 
0.24, Qk = and Qa = 0.76 ("Spergel et al.'"2007). In addition, 
we assumed a Hubble constant: Hq = /ikm/s/Mpc with h = 73. 

With this definition the three-dimensional galaxy positions 
(X,Y,Z) in comoving space are calculated as follows: 

X = r • cos(^) • cos(a) 
Y = r • cos(^) • sin(a) 

Z = r-sin(^). (3) 



http://sdss.physics.nyu.edu/vagc/ 



^ http://www.sdss.org/dr7 

^ Not to be confused with the r-band. 
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3.1.2 Supersampling step 

Now, we can sort the galaxies onto a grid with a supersampling 
scheme, which will permit us to apply a reconstruction scheme 
based on FFTs. The much lower computational costs of FFTs 
permits us to tackle much more ambitious matter reconstructions 
than have been attempted previously with Wiener-filtering tech- 
niques. The main difficulty in signal processing via FFT techniques 
arises from the need to represent a continuous signal which ex- 
tends to infinity on a finite discrete grid. Various methods to ap- 
proximate the real continuous signal by a discrete representation 
have been proposed in literature, e.g. Nearest Grid Point (NGP), 
Cloud In Cell (CIC) o r Triangular Shaped Clouds (TSC) (see e.g 
iHocknev & Eastwoodl [l981). However, all of these methods are 
only approximations to the ideal low-pass filter, and introduce dis- 
cretisation artifact s such as aliasi n g. For a deta i led discus s ion se e 
e.g. Hockney & Ea stwood (Il98lh : Ijind (l2005h : ICui et all (l2008h : 
Jasche et al. (2009). In recent years a number of methods have been 
proposed to correct for these artifacts, especially for the purpose of 
power- spectrum estimation (|jindl2005l : ICui et al.ll2008h . However, 
common methods to suppress these artifacts in the discretised sig- 
nals, tend to be numerically expen sive. 

To circumvent this problem, I Jasche etaP ([2009) proposed a 
supersampling technique, which is able to provide discrete signal 
representations with strongly suppressed aliasing contributions at 
reasonable computational cost. This method relies on a two-step 
filtering process, where in the first step the signal is pre-filtered by 
sampling the signal via the TSC method to a grid with twice the 
target resolution. In our case we use a 1024^ grid. In a second step 
the ideal discrete low-pass filter is applied to the pre-filtered signal, 
allowing us to sample the low-pass filtered field at the lower target 
resolution. In this fashion we obtain an aliasing free signal sampled 
at a target resolution of 512^ cells (with equal number of cells in 
each axis: Nx x Ny x Nz). 

Let us define the observed galaxy sample as a point source 
distribution ri^{s) with coordinate s 

K 

^p(^) = Si)^ (4) 

with iVc being the total observed galaxy number count and (5d the 
Dirac-delta function. The process of putting the galaxies on a reg- 
ular grid is equivalent to a convoluti on in real- space followed by a 
grid-point selection step according to lHocknev & Eastwood (Il98ll) 

n°(s)=Xl{^) j ds' Ks(s - s')nl(s'), (5) 

with n(r) = X^nez ~ '^)^ ^ being the grid-spacing and 
the supersampling kernel. We define the resulting field as the ob- 
served galaxy number density n°(s). The observed galaxy number 
density is a function of the Cartesian position in comoving space, 
but includes redshift distortion. For this reason, we say that the dis- 
tribution is in redshift- space denoted by the coordinate s. 



3.1.3 Calculation of the three-dimensional mask: completeness 
on the sky and selection function 

To define the data vector we need to model the three dimen- 
sional mask. We do this by processing the two-dimensional sky 
mask in several steps. First, the sky mask or completeness on 
the sky wsky{o^,6) is evaluated using the survey mask provided 
in Sample dr6fix of the NYU-VAGC (see Section |2]) on an 



equidistant a x 5-grid with 165000 x 75000 cells having a reso- 
lution of 36^^ both in right ascension and declination (see panel (a) 
in Fig. [6]). Then, we project the sky mask on a comoving Cartesian 
X X Y X Z-grid containing 512^ cells. 

This is done with the transformation given by Eqs. [3] taking 
projected values of the mask every 0.25 Mpc in the radial direction 
which are then assigned on the grid using the Nearest Grid Point 
(NGP) method and normalized by the number of mask counts at 
that grid cell. The analogous procedure is done with the radial com- 
pleteness Wr(z), i.e. the selection function which is available as a 
function of redshift. 

Finally, we obtain the three dimensional mask it; (s) as a prod- 
uct of the projected two dimensional mask, i.e. the completeness 
on the sky wsKvioi, S) and the projected selection function Wr{s) 
(see Fig.[T]and panels a in figures 6, 8, 9 and 10 in section|5]). We 
define w{s) ^ 1. 

3.2 Definition of the data model 

Let us define the observed galaxy overdensity field afl: 

5°is) = ^-wis), (6) 

with n being the mean galaxy number density. 

The mean galaxy number density on the grid n is defined by 
the quotient of the total number of observed galaxies and the 
observed volume V^. Note, that this assumes that the observed vol- 
ume is a fair sample of the Universe. We can then write: 

/\ro V^-^cells ]\fO 

V- - fdrw{r) ' 

with being the number of observed galaxies at cell i: = 
J2iLi ^ci^ ^ceiis being the total number of cells and the observed 
volume being defined by the integral: = f dr w{r). The rela- 
tion between the expected galaxy number density in a small volume 
AV around position r pg{r) and the mean galaxy number density 
in the whole volume under consideration V is given by: 

pg(r) = n(l + 5g(r)), (8) 

where Sg(r) is the galaxy overdensity field, which describes the 
spatial density distribution of galaxies. Here we assume that effects 
due to galaxy evolution are negligible in the observed region, and, 
especially, that the mean number density is redshift independent. 

The observed quantity 6g{s) defined in Eq.[6]has to be related 
to the signal, we seek to recover, via a data model. This relation is 
to be inverted by the reconstruction algorithm. 

3. 2. 1 Physical model 

In this section we describe the physical model which will enable 
us to apply linear reconstruction methods and obtain an estimate 
of the matter field valid on large-scales (>1 Mpc). Let us assume 
a continuous matter field ^m(^) in comoving space r as well as a 
continuous galaxy field ^g. We model the actual galaxies as being 
Poisson distributed according to this field with an expectation den- 
sity of n (1 + Sg{r)). In general, the relation between the galaxy 
overdensity field and the underlying matter field 6m will be given 
by a non-local and nonlinear bias operator. However, the formalism 

^ Not to be confused with the declination S. 



© 0000 RAS, MNRAS 000, 000-000 



Cosmic Cartography of the LSS with SDSS DR6 5 



we present here, without any further development, allows us to ac- 
count only for a non-local linear translation-invariant bias operator 
B(r — r') of the form: 



«,(r) 



B{r-r')dmir) 



(9) 



Note, that this linear operator is known to fail at least at sub-Mpc 
scales. Several non-local biasing models are described in the liter- 
ature, which are mainly used to correct for the shape of the power- 
spec trum on large- scales dXegmark et alj l2004l : iHamann et alj 
12008). We will carry this general bias through the algebraic cal- 
culations. However, in this work we consider the galaxy field to be 
a fair sample of the matter field. Thus, we assume the special case 
of a linear constant bias equal to unity: B{r, r') = fo(r — r'), so 
that = ^m. Nevertheless, any non-local bias scheme of the form 
of Eq.[9]can be adapted without the need to repeat the filtering. We 
show that one can easily deal with non-local bias models in a final 
deconvolution step (see Eq.[30l). As a result, various posterior bias- 
ing assumptions can be applied based on this reconstruction to test 
different biasing models. 

We will also assume the existence of a redshift distortion op- 
erato]0 Z{s^r), which transforms the density field from real- space 
into redshift- space. Note, that the redshift distortion operator can- 
not be a linear operator, since it depends on the matter field 5m{r). 
However, we will approximate it with a linear redshift distortion 
operator Z(s,r) here: 



5g{s) = J dr Z{s,r)5g{r) 



(10) 



and postpone a matter field dependen t treatment, sampling th e pe- 
culiar velocity field as proposed in (lKitaura&EnBiinll2008h . for 
later work. 

Let us further assume an additive noise term resulting in a data 
model for the observed galaxy overdensity as: 

S^''^{s) = w{s) J drZ{s,r) J dr B{r - r)6m{r) + e{s), 

(11) 

with e being the noise term. The corresponding vector representa- 
tion of the data model can be approximated as: 



(12) 



with the subscripts r and s denoting real- space and redshift- space, 
respectively. The response operator can be defined by 



s'^s,r^r 5 



(13) 



with Ws being the three dimensional mask operator defined in con- 
tinuous space by: W{s,s') = i(;(s)fo(s — s^), Zs,r being the red- 
shift distortion operator, and Br being the bias operator. Now we 
need to specify a model for the noise term. 

3.2.2 Statistical model 

Assuming that the galaxy distribution is generated by an inhomoge- 
neous Poissonian distribution, the number galaxy count A^c within 
a volume AV^ around position r is distributed as: 



Arc(r) -Ppois(Are(r)|A(r)). 



(14) 



with 



Ppois(Are(r)|A(r)) = ^^i^ exp(-A(r)), (15) 



where the expected number of galaxy counts is given by the Pois- 
sonian ensemble average: A(r) = {Nc{r))g and is directly re- 
lated to the expected galaxy density pg at that position: pg{r) = 
(iVe(r)>g/Al/.Here {{}), = ({})(jv,|a) = 122=0 ^p-(^c | 
A){ } denotes an ensemble average over the Poissonian distribu- 
tion. We further model the observational selection of Nc{r) galax- 
ies out of the A^c present within the small volume A 1/ to be a bino- 
mial selection with an acceptance rate w{r). We then can write: 

iV°(r) - PBin{N:{r)\Nc{r),w{r)), (16) 

with 

PBin{K{r) I Nc{r),w{r)) 

^f^]^ (^(r))-c(n(i_^(,))(-c(n-A^°(n). 

The expected mean observed number of galaxies in the volume AV 
is: 



(iV°(r))„ = w(r)iVc(r), 



(17) 



where ({})„ = ({ }>(ivj|^„„) = E?^j=o PBin{N° \ N,,w){} 
represents the ensemble average over the binomial distribution with 
a selection probability w. Consequently, one can model the ob- 
served number of galaxies, as a single Poissonian process: 

iV°(r)~Ppois(iV°(r)|A°(r)), (18) 

with mean 

A°(r) = wir)Xir) = «;(r)(iVe(r)>g = ((iV°(r))g>„. (19) 

Note, that the Poissonian and the binomial distributions commute 
with each other. 

3.2.3 Noise covariance and data autocorrelation matrix 

Let us define the noise covariance matrix, according to the assump- 
tions made in the previous section, as the shot noise resulting from 
an inhomogeneous Poisson distribution for the galaxy distribu- 
tion n(s), and a binomial distribution for describing the observa- 
tion process which reduces the fraction of observed galaxies fol- 
lowing the selection function. We then obtain an expression for the 
noise covarianc^ll: 



(e(si)e(s2))(e|<5^,pj = ((e(sl)e(s2))g)^. 

{{n{s^)),)^{{rf[s,)),)^) 



^ ^{{{n{si)n{s2)))g) 



r((n°(si))g)to^D(si 



^2 



=2'^(5l)(^(5l))g^D(si - S2) 



(20) 



where we have used the properties of the variance and mean of 
these distribution functions and have added the superscript SD to 
denote that this covarianc e matr ix is signal- depen dent (see section 
2.5.3 and appendix A in ' Kitaura & EnBlinI 1200 8*). Note, that this 
noise covariance is defined as the ensemble average of the correla- 
tion matrix of the noise over all possible noise realizations denoted 
by the subscript (e | (5m, Pe) with being a set of parameters 
which determine the noise. Here, we have neglected the cell to cell 
correlation introduced by the gridding scheme we have used (TSP) 
as the first step in our supersampling scheme. 



Not to be confused with the Z-axis in our Cartesian grid. 



Not to be confused with the galaxy number counts Ac. 
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Having defined the data model, together with the noise model, 
we can calculate the expected data autocorrelation matrix, which 
is defined as the ensemble average over all possible galaxy real- 
izations and density realizations (cosmic variance) leading to the 
following expression: 



(((^g''^(Sl)^g''^(52))^.)g)r 



(21) 



= w{si)w{s2) j driZ(si,ri) j dr2 Z{s2,r2) 

+ (Ar(si,S2))m, 

with {{}){5^\p^) fdSmP{Sm I Pm) being the en- 

semble average over all possible matter density realizations with 
some prior distribution P{6m \ p^) with p^ being a set of pa- 
rameters which determine the matter field, say the cosmological 
parameters. Note, that this equation is only valid in the approxima- 
tion where the bias and the redshift distortion operators are linear. 
The noise term is the in Eq.[2T]has the following form: 



iV^^^(8l,82) ^ (iV^"(8l,82))n. ^ (e(3l)6(82))(,^,e|p) 



(22) 



since ((n(r))g)m = (n (1 + ^g(r^)))m = n, assuming again, that 
the observed volume is a fair sample of the Universe. The noise 
covariance has been denoted with the superscript LSQ because it 
corresponds to the expression which is obtained by performing the 
LSQ approach to derive the WF, i.e. minimising the ensemble av- 
erage of the squared difference between the real underlying den- 
sity field and the LSQ estimator 6^^^ over all possible signal 
and noise e realizations: ((^m — ^m^^)^)(5m,e|p) with p be- 
ing the joint set of parameters: p = (for a derivation 
see appendix B in lKitaura & EnBlinll2008l) . We have also assumed 
that the cross terms between the noise and the signal are negligi- 
ble: ((5me^)m = 0. This should be further analyzed in future work. 
Higher order correlations between noise and signal in fact exist, and 
can be exploited using schemes like the Poissonian scheme pro- 
posed in lKitaura & EnBliiil (l2008h . Note, however that we consider 
a signal-dependent noise for the WF Eq.[20l which requires a model 
for the expected observed galaxy n umber density ((n^(si) )s)w 
(for differences in the derivation see iKitaura & EnBlinll2008l) . We 
restrict ourselves to the LSQ noise covariance model N^^^ siven 
by Eq.[22]in our application to the SDSS data (section[5]). Note, that 
the LSQ representation of the Wiener-filter is a linear operator in 
contrast to the alternative formulation which depends on the signal 
and thus is a nonlinear filter. We explore methods to deal with the 
signal-dependent noise formulation with mock galaxy catalogues 
and compare the results to the LSQ version of the Wiener-filter 
(see section]?]). 

Note, that by construction the data autocorrelation matrices 
for the observed galaxy overdensity field and the theoretical over- 
density field are identical given the noise model in Eq. [20l 

m''''(si)S°/\s2)),U^ = m(s^)S°{s2)),U^. (23) 



3.3 Reconstruction algorithm 

In this section we propose a two step reconstruction process: first a 
Wiener-filter step and second a deconvolution step. 



3. 3. 1 Wiener-filtering 

First, we recover the galaxy field in redshift- space (<5g,s) applying 
the Wiener-filter. The version of the Wiener-filter we use can be 
derived as follows. Let us approximate the posterior distribution 
assuming a Gaussian prior and a Gaussian likelihood: 

P{^Z,s I <5g,3,p) OC 

exp ^-^ [(5g,s'^Sg,s"^(5g,s + ((5g,s - Ws(5g,s)^Ns"^((5g,s - Ws(5g 

with the signal autocorrelation matrix Sg,s = (<5g,s(<5g,s)^) being 
the inverse Fourier transform of the assumed model galaxy power- 
spectrum in redshift- space : 5'g,s(fe, fc^) = (2ti)^ P^(k')5r>(k — k') 
and the hats denoting the Fourier transform of the signal autocorre- 
lation matrix. Note, that the posterior distribution depends also on 
a set of parameters p which determine the power- spectrum Pg(k). 
The log-posterior distribution is then given by: 

logP((5g,s I Sl^s.p) OC (25) 
(5g,.^Sg,.-'(5g,. + {SI, - W.(5g,.)^N.-'((5°,, - W.(5g,.) 

The first two terms can be combined to one term: 
<5g,s^(cr^p)~^(5g,s, using the Wiener- variance: cr^p = 
(S~^ + W1N7^Ws)~\ To find the mean of the posterior 
distribution we seek an expression for the log-posterior of the 
form: 

logP((5g,s I <5g,s,p) OC ((5g,s - (<5g,s)wF)^(o-wF^)"^(<5g,s - {Sg,s)^ 

(26) 

with ((5g,s)wF = FwF<5g s being the mean after applying the 
Wiener-filter Fwf to the data. Now the third and the fourth term 
of eq. (l26l l can be identified with the terms in eq. (|26] | as: 



(24) 



■ 5^JyvlNs-'6l, = -(5g,,^(o-^p)-^FwF(5°,. 



(27) 



and 



■ 61JNs-'WsS^,s = -(5°,,^FVF(^wF)"'<5g,., (28) 



respectively. The remaining term depends only on the data and is 
thus factorized in the posterior distribution function as part of the 
evidence. From both eq. (|27] | and eq. (|28] ) we conclude that the 
Wiener-filter has the form 

FwF = o-^fWJn,-^ = +WlN-'Ws)-'wlNs-\ (29) 

The mean ((5g,s)wF of the posterior distribution defined by Eq.l24l 
can be obtained by: 

((5g,.)wF = (Sg/, + wIN7^W.)"' wIN7^(5°,,. (30) 

We favor this signal- space representatior|fl of the Wiener-filter 
with respect to the equivalent and more frequently used data- 
space representation in LSS reconstructions: (<5g,s)wF = 
Sg,«Wj (WsSg,sWl + N)"^ (5g,^ (see for example Zaro ubi et all 
1 19951) . because it avoids instabilities which otherwise arise in our 
rapid algorithm for evaluating the filter. 

Let us distinguish between the linear LSQ and the nonlin- 
ear signal-dependent noise formulation of the Wiener-filter. The 
first takes the matter field averaged noise, covariance Eq.|22]A^ = 



We use here the terminology introduced in lKitaura & EnBlinI ( l2008h . 
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jyLSQ ^^^^ below when analyzing the SDSS data (see sec- Eqs.: 



tionO. In the case of a signal-dependent noise: N = N^^ one 
needs an estimate of the expected observed galaxy number density 
w(s)X(s) = ((n''(s) )^).n (see Eg. [2 0l and section [3A2]) . Such an 
approach was done bv lErdogdu et alj J2OO4I) . 



3.3.2 Deconvolution step 

In the second reconstruction step, we deconvolve the galaxy field 
(<^g,s)wF from the assumed redshift distortion and galaxy bias op- 
erators, obtaining an estimate for the underlying matter field in real- 
space: 



((5m,r)wF = "^Z^^5 ((5g,s)wF. 



(31) 



In this approximation, we can easily transform the reconstructed 
galaxy field into the matter field by just performing a final decon- 
volution with some scale-dependent bias of the form: B(k^k') = 
b{k)S-D{k — k'). As already mentioned above, our result should 
not be restricted to a single arbitrary chosen bias model. We there- 
fore choose to recover the galaxy field by assuming a bias equal 
to unity from which matter reconstructions for all possible linear 
(and invertible) bias schemes can easily be constructed via Eq. [31] 
Note, that an alternative representation of the Wiener-filter which 
regularizes the bias and the redshift distortion operator when they 
are not be invertible, consists of including them in the response 
operator (Eq. [13]) when calculating the Wiener-filter, leading to: 



3.3.3 Redshift distortion operator 

Following lErdogdu et all (l2004l) we define the power- spectrum 
in redshift- space as the product of the power- spectrum in real- 
space and an effective redshift distortion factor given by the angle- 
averaged Kaiser factoiEl Kik^jj) times the damping Lorentzian 
factor D{k^ii)'. 



p^{k) = {K{k, fi)D{k, fi))^p:;,{k), 



(32) 



with /i = k • r /{\k\\r\). The Kaiser factor is given by fsee lKaiseJ 
[l983): 



i^(fe,/i) = (l+/3/i')^ 



(33) 



with [3 being the redshift distortion parameter which can be approx- 
imated by: /3 ^ assuming a constan t bias equal to unit y and 
neglecting dark energy dependences (see iLahav et al.|[l99lh . The 
Lorentzian damping factor is based on an exponential distribution 
in real- space for the pairwise peculiar velocity field and is given by: 



D{k,^) 



l + (A;V2/i2)/2' 



(34) 



with k = \k\ and av being the average dispersion velocity of the 
galaxies, w hich we assume to be cr^; — 500 km H^^ (see 
for example iBallinger et alJll996l : [jing et all [19981 : [jing & Borne] 

l2004lLietal.ll2006al). 

We refer to lErdogdu et al ] (l2004l) for the angle- average ex- 
pression of the product of the Kaiser factor and the damping factor. 
Consequently, we introduce the angular averaged redshift distor- 
tion operator defined as the square root of the factor in the previous 



Not to be confused with the supersampling kernel Kq . 



k') = yJ{K{k',fi)D{k',fi))^Si,{k - k'). (35) 

By construction, this operator yields the correct power- spectrum 
modification for the translation from real- to redshift spacq^j. 

Note, that this approximation is valid up to second order statis- 
tics, and gives only an effective solution to the redshift distor- 
tion due to the angular averaging. A proper solution would re- 
quire a phase and direction dependent redshift distortion oper- 
ator. If we assume that the galaxy bias is unity, we then can 
write the galaxy power- spectrum in redshift space as: P|(fc^) = 
{K(k, fi)D(k, fi))^P^{k). Note, that this reduces the validity of 
our reconstruction to scales larger than the mesh resolution which 
is of about 1 Mpc to scales of about 10 Mpc. The power spectrum in 
real- space is given by a nonlinear power- spectrum that also de- 
scrib es the effects of v irialised structures with a halo term as given 
bv lSmithetalJ (l2003^ at redshift z — 0. In addition to the cosmo- 
logical parameters presented in section [TTTI we assume a spectral 
index ris = 1. With each of the required operators defined, we can 
now apply our reconstruction algorithm as we demonstrate in the 
next section. 



3.4 Signal-dependent noise formulation of the Wiener-filter 

To apply the signal-dependent noise formulation of the Wiener- 
filter one needs to find estimators for the expected density field in 
the signal-dependent noise covariance (Eq.[20l). We require either a 
good estimator for A°(r) = ((A^°(r))g)^, or for A(r) = (A^c(r))g 
since A°(r) = w{r)X(r). 



3. 4. 1 Flat prior assumption 

The inverse weighting estimator used in pr evious works to esti - 
mate the noise covariance (see for example Erdogduetal ]|200i) 
can be derived from the frequentist approach by assuming a flat 
prior for the overdensity distribution or equivalently infinite cos- 
mic variance. 

Let us start with Bayes theorem: 



P(A°|iV°) 



P{N°\\°)P{\°) 
P{Ng) 



(36) 



The flat prior is defined as: P(A°) = c, with c being a constant. 
The evidence is then given by 

P{K)= dA°P(iV°|A°)c = c, (37) 

^0 

since 

r dA° P(iv°iA°) = r dA° = Hi&ii = 1 

(38) 

Consequently, we obtain that the posterior distribution is equal to 
the likelihood 



P(A°|Af°) = PiN:\\° 



(39) 



Note, that we deviate here from lErdogdu et al.l ([2004*) in the order of the 
angular averaging and square root. An inspection of the power- spectrum 
corresponding to the reconstructions shows, however that only th e prescrip- 
tion as implemented here leads to agreement with the nonlinear iSmith et al.l 
( l2003h power-spectrum. 
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Figure 1. Radial selection functions used for the mock tests. Note, that the selection function used for the first mock test it^MOCKi is identical to the radial 
completeness of the DR6 catalogue k;dr6- The second selection function k;mock2 is calculated by weighting k;dr6(^) with the factor 100 Mpc/r for 
r ^ 100 Mpc. 



The maximum likelihood estimator Amax is obtained by looking at 
the extrema: 



aP(A^ax|iVc") 

d\ max 

{N^{w\ma.^)~^W - W) 



(40) 



leading to: 



An 



(^An.ax)^°e-"^" 



(41) 



Note, that the maximum estimator Amax is not a valid estimator 
for the noise covariance matrix, since it can become zero at cells 
in which no galaxy count is present even if the cell belongs to the 
observed region. The mean estimator A^ean can be found by per- 
forming the following integral: 



noo 

/ dA°A°P(A°|iV°) 



(42) 



Thus, we have: 



1 



{n: + 1) 



(43) 



w w 

The mean estimator Amean gives a regularized solution with re- 
spect to the maximum estimator Amax overcoming the problem of 
having zero noise at cells with zero observed number counts. Both 
estimators however, rely on the flat prior assumption which can 
be dominated by the shot-noise for low completeness. This can be 
a problem when the reconstruction is performed on a fine mesh 
with extremely low completeness. For this reason, we test the SD 
Wiener-filter with an alternative scheme presented in the next sec- 
tion. 



3.4.2 Statistically unbiased Jackknife-like scheme 

The Jackknife-like scheme we present here and test in the next sec- 
tion produces subsamples from a galaxy distribution with selection 
function effects which are statistically unbiased with the underly- 
ing mean number density having a noise term with a structure func- 
tion depending only on A(r). The first step of the scheme consists 
of generating a subsample using the binomial distribution given the 
observed number counts and the selection probability a/w(r) with 
a tunable parameter a < rmn(w(r))\ 

\Ppoi.(iV^(r) I a\(r)). 

In the second step the subsample Nc{r) is inverse weighted with 
a: 



<(r) = -N^{r). 

a 



(44) 



One can notice, that the ensemble average over all possible a real- 
izations leads to the mean number density A(r): 

((A^c^(^))(iVe|A))c. = U{K{r))iN.\x))o. = (iVc(r))(^,|,) = A(r) 

(45) 

Here, {{})a is a binomial average with acceptance frequency a. 
The estimator for (A°(r)) jk = w(r)Nc (r), with the subscript 
JK standing for the Jackknife estimator. We test the estimator pro- 
posed here to sample the noise covariance (see section |4l). 



4 QUALITY VALIDATION OF THE RADIAL 
SELECTION FUNCTION TREATMENT 

In this section, we evaluate the quality of the reconstruction method 
under several incompleteness conditions. We restrict the study to a 
mesh of 128^ cells for a cube with 500 Mpc side length and ignore 
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Figure 2. Mock test 1 using iumocki- Input galaxy sample ~ 20% of the complete galaxy sample. Slices around Y~ 270 Mpc through a 500 Mpc cube 
box with a 128^ grid for different quantities without smoothing. Panel (a): observed mock galaxy overdensity field before correcting for the incompleteness. 
Panel (b): DR6 radial completeness corresponding to this test. Panel (c): underlying complete mock galaxy field. Panel (d): inverse weighting scheme applied 
to the sample represented in (a). Note, that panels (a), (c) and (d) were created taking the mean over 10 neighboring sHces around the slice at Y~ 270 Mpc, 
corresponding to a thickness of 40 Mpc. 



bias and redshift distortion effects. The necessity of performing a 
reconstruction step to make further studies of the large-scale struc- 
ture is addressed. More simple schemes in which the galaxies are 
just gridded and the resulting field smoothed are shown to lead to 
significantly worse estimates of the matter field. 

For this study, we consider a homogeneous subsample of 10^ 
galaxies in a 500 Mp c cube box from the mock galaxy catalogue by 
IPe Lucia & Blaizoj (12007.) selected at r andom based on the Millen- 
nium Simulation "dSpringel et al.ll2005h . We define the 10^ galaxy 
sample as our complete sample. Then, we generate two incomplete 
samples by radially selecting the galaxies according to two dif- 
ferent radial completeness functions i(;mocki and k;mock2 (see 



Fig.[T]). This is done by drawing random uniform numbers between 
and 1 for each mock galaxy and selecting the galaxies depending 
on whether the drawn number is above or below the value of the 
completeness at the corresponding distance to the observer. Note, 
that this ensures a perfect binomial observation process treating all 
the galaxies independent of their luminosity and thus avoiding the 
problem of galaxy biasing. The observer is defined in both cases at 
an equivalent position in the box to the real observer in the appli- 
cation to the observed DR6 data (section [5]), namely at X=0 Mpc, 
Y=250 Mpc, and Z=20 Mpc. Note, that the arbitrary coordinates of 
the mock data range from to 500 Mpc in each direction X, Y, and 
Z. 
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X [Mpc] X [Mpc] 




X [Mpc] X [Mpc] 



Figure 3. Mock test 1 using k;mocki- Input galaxy sample ~ 20% of the complete galaxy sample. Slices around Y~ 270 Mpc through a 500 Mpc cube box 
with a 128^ grid for different quantities without smoothing. Panel (a): LSQ Wiener reconstruction to correct for the shot noise of the mock galaxy field taking 
the complete sample. Panel (b): LSQ Wiener reconstruction of the incomplete mock galaxy field taking into account the averaged shot noise and the radial 
selection function. Panel (c): mean over 200 Bayesian Wiener reconstructions to correct for the shot noise of the mock galaxy field taking the complete sample. 
Panel (d): mean over 200 SD Wiener reconstruction of the incomplete mock galaxy field taking into account shot noise and the radial selection function. Note, 
that all the panels were created taking the mean over 10 neighboring slices around the slice at Y~ 270 Mpc, i.e. over a slice of thickness 40 Mpc. 



We consider the LSQ formulation of the Wiener-filter, which 
is a linear filter with a homogeneous noise term multiplied with 
a structure function given by the selection function Eq. |22l and 
the signal-dependent noise formulation, which is a nonlinear filter 
as it depends on the signal (see Eq. |20]),and the inverse weighting 
scheme. In addition, to the Wiener-reconstruction methods, we de- 
fine an inverse weighting scheme (IW) to estimate the underlying 
matter field as follows: first each galaxy is weighted with the in- 
verse of the completeness at its location, then the galaxy sample is 
gridded according to the corresponding particle masses (we use our 
supersampling scheme to suppress aliasing), and finally the result- 



ing field is convolved with different smoothing kernels. The first 
part of this scheme, leaving the smoothing for a later step, can be 
summarized by the following Eq. : 

(n(r))iw = n(-^) J dr' Ks(r-r')-^^nl{r'), (46) 

where we have denoted the corresponding estimator by the angles: 
({ })iw. Note, that the completeness cannot be zero at a position 
in which a galaxy was observed. In order to make a quantitative 
comparison between the two Wiener-filtering methods and the in- 
verse weighting method, a true underlying field ^^^^^ needs to be 
defined. Since the inverse weighting scheme does not correct for the 
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Figure 4. Statistical cell to cell correlation between the mock true density field S^^^^ and the reconstructed density field 6^^^ at different scales for our first test 
case using i(;mocki • Input galaxy sample ~ 20% of the complete galaxy sample. Also indicated: the statistical correlation coeficient r, the Euclidean distance 
Deuc and the Kullback-Leibler distance Dkl first for all the sample (black dots), then for the sample in the radial comoving radius range between 200 and 
400 Mpc (green dots), and finally in the range between and 200 Mpc (red dots) away from the observer. The upper panels correspond to the comparison 
without smoothing and the lower panels after smoothing with a smoothing radius of rs = 5 Mpc. Comparison between the complete mock galaxy field (in 
this case: (5^^^^) and the inverse weighting scheme applied to the incomplete sample (in this case: 6^^^) without smoothing (a) and after smoothing panel (d). 
Panel (b) and (e) represent the comparison between the average shot noise corrected complete mock galaxy field (in this case: (5^^^*^) and the LSQ Wiener 
reconstruction of the incomplete sample (in this case: 5^^^) with the corresponding scale at bottom or top. Panel (c) and (f) represent the comparison between 
the local shot noise corrected complete mock galaxy field (in this case: (5*^^«5) and the SD Wiener reconstruction of the incomplete sample using the Jackknife 
estimator (in this case: 5^^^) with the corresponding scale at bottom or top. 



shot noise, we will compare with the complete mock galaxy sample 
(see panel (c) in Fig. |2]) after smoothing on different scales. Note, 
that a consistent comparison for this case is difficult, since the shot 
noise varies with the different galaxy samples and with the distance 
to the observer. For the Wiener reconstruction case study we define 
the true underlying matter field S^^^^ as the resulting Wiener recon- 
struction taking the complete mock galaxy sample (see panel (e) in 
Fig. 12]). Note that the true field thus also differs between our two 
Wiener filtering schemes. We will denote the reconstructed fields 
with each method as 



The cell to cell plot of the reconstruction against the true den- 
sity field is highly informative because the scatter in the align- 
ment of the cells around the line of perfect correlation (45° slope) 
gives a qualitative goodness of the reconstruction. In general, the 
quality of the recovered density map is better represented by the 
Euclidean distance between the true and the reconstructed signal 
(see Kitaura & EnBlin 2008). The ensemble average of this quan- 
tity over all possible density realizations can also be regarded as 
an action or loss function that leads to th e Wiener-filter through 
minimization fsee lKitaura & EnBlinll2008h . Here we introduce the 
Euclidean distance: 



4.1 Statistical correlation measures 

To give a quantitative measurement of the quality of the recon- 
structions, we define the correlation coefficient r between the re- 
constructed and the true density field b^F^ 



/ erec rtrue\ 

r{5 ,5 ) 



(47) 



-| ^ ^cell s 
\ -^^cells 



(48) 



with Nc, 



128^ for the mock tests). Let us, in ad- 



dition, define the normalized Kullback-Leibler distanc4_| (see 



Not to be confused with the comoving distance r. 
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Figure 5. Same as Fig.|4l but using wy[oci:i2- 



lKullback&Leiblei]ll95lh as 



1 + 



(49) 

In our analysis we also compute smoothed versions of the den- 
sity field convolving it with a Gaussian kernel given by: 

|2 ^ 



G(r,rs) = exp( 1^ ) , (50) 
with rs being the smoothing radius. 



4.2 First mock test 

In the first mock test we try to emulate the same completeness con- 
ditions as given in the observed DR6 sample. For that, we take the 
complete mock galaxy catalogue (10^ galaxies) and select accord- 
ing to the DR6 radial selection function (i6;mocki = 'w^drg) a sub- 
sample leaving about 20% of the total number of galaxies (218020) 
(see Fig. [3. The DR6 radial selection function can be seen as the 
black line in Fig. [T] A section through the box showing the com- 
pleteness can be also seen in panel (b) of Fig. [2l The observer can 
be identified as being at the center of the spherical shells with equal 
completeness. The resulting overdensity field after applying this se- 
lection function to the complete mock sample can be seen in panel 
(a) of Fig. [21 Note, that we show here the mock observed galaxy 
field setting it; = 1 in Eq.[6]in order to clearly see the selection ef- 
fects. In the following, the discrete galaxy field (including Poisson 
noise) is represented with red color and the noise corrected field is 
represented in blue color. We will define the complete mock galaxy 
field including Poisson noise (panel (c) in Fig. [3 as the true galaxy 



density field for the inverse weighting scheme. The corresponding 
noise corrected fields using the LSQ WF (panel (a) in Fig. [3]) and 
the SD WF (panel (b) in Fig. [3]) are defined as the true galaxy den- 
sity field for the Wiener reconstructions. The true dark matter field 
is approximately related to this via Eq.[9j however, here we want to 
exclude the complication of galaxy biasing. 

Panel (d) in Fig. [2 shows the result after applying the inverse 
weighting scheme. Panels (b) and (d) of Fig. [3] show the respective 
reconstructions using the LSQ and the SD WF. One can clearly see 
the noisy reconstruction produced by the inverse weighting scheme 
for structures located at large distances to the observer in contrast 
to the smoother estimation made by the Wiener-filtering schemes. 
The SD WF was applied for the complete galaxy sample using our 
statistically unbiased Jackknife-like scheme with an a parameter of 
10~^. The means after 200 reconstructions are shown in panels (c) 
and (d) for the complete and the selected samples respectively. The 
corresponding statistical analysis can be seen in Fig.|4l The cell to 
cell correlation plots show the tendency of the inverse weighting 
scheme to overestimate the density while the opposite is true in a 
significantly more moderate way when applying the Wiener-filter. 
In the case without smoothing (a mesh of size '^3.9 Mpc) (pan- 
els (a) and (d) in Fig.|4])) the qualitative and quantitative difference 
between the methods is very large, showing significantly better cor- 
relation coefficient and lower Euclidean and Kullback-Leibler dis- 
tances for the Wiener reconstructions than for the inverse weight- 
ing scheme. Only when the fields are smoothed with a Gaussian of 
radius rs = 5 Mpc does the difference between the matter field 
estimators drop. With this smoothing the statistical correlation co- 
efficient are similar for the Wiener-filter and the inverse weighting 
scheme. However, the Euclidean and Kullback-Leibler distances 
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remain being lower for the Wiener-filter (WF) reconstructions (see 
Fig.a. 

4.3 Second mock test 

For the second mock test results we modify the DR6 selection func- 
tion to drop faster towards larger radii leaving less than 10% of the 
galaxies (87220) by weighting w-DR6{r) with the factor 100 Mpc/r 
for r ^ 100 Mpc. The corresponding radial selection function 
(^^^MOCK2) can be seen as the dashed line in Fig.[T] The dramatic 
difference from DR6 completeness can be seen, using LSQ and 
SD formulations respectively. The noisy reconstruction produced 
by the inverse weighting scheme for structures located at large dis- 
tances to the observer is now even more visible than in the previous 
test. Cells far away from the observed are excessively weighted. 
The Wiener-filter in contrast gives a smoother and more conser- 
vative estimation in regions in which the data are more incomplete. 
However, it remains sharp in regions where the information content 
is high (see structures close to the observer). 

The corresponding statistical analysis can be seen in Fig. \5\ 
The tendency to overestimate the density of the inverse weight- 
ing scheme is now extreme. Smoothing helps to raise the correla- 
tion coefficient values and to decrease the Euclidean and Kullback- 
Leibler distances. They remain, however, clearly above the ones 
achieved with the Wiener-filter schemes. 



5 MATTER FIELD RECONSTRUCTIONS OF THE SDSS 
DR6 

This work presents the first application of the ARGO-code to ob- 
servational data. This yields the matter field reconstruction of the 
SDSS DR6 in the main area of the survey which is located in the 
northern Galactic cap on a comoving cube of side 500 Mpc and 
512^ cells. 

In this section we describe a few remarkable features in the 
reconstructed matter field, demonstrating the quality of the recon- 
struction and the scientific potential for future applications. First, 
we discuss the mask and the projected three dimensional recon- 
struction without smoothing and after smoothing with a Gaussian 
kernel with a smoothing radius of rs =5 Mpc and rs =10 Mpc as 
displayed in Fig. (6] We then describe the largest structures in the 
nearby Universe, in pa rticular the Sloan and Cf A2 Great Walls (see 
iGott et al.ll2005l : lGeller^ Huchralll98^ . Later, we analyze void or 
cluster detections which can be made with this kind of work. Fi- 
nally, we analyze the statistical distribution of matter. 

5.1 Mask and completeness 

The sky mask for the region is shown in panel (a) of Fig. (6] The 
high resolution (36^^ in both a and 6) permits us to visualize the 
plates of the SDSS with the intersection of several plates leading to 
higher completeness. The mask is divided into three patches: one 
small beam at high declination and right ascension angles and two 
wide regions. All the patches together cover almost a quarter of the 
sky. Between the two wider regions there is a large gap and there 
are several additional smaller gaps inside the patches. Such a com- 
plex mask is an interesting problem for the ARGO-code. It allows 
us to test, whether it can properly handle unobserved regions with 
zero completeness. Slices of the three dimensional mask calculated 
as the product of the completeness on the sky and the selection 
function (see section 13.1.31 ) are presented in panel (a) of Figs. [U 



[21 [TOl and panel (e) of Fig. [9l In these plots one can see how the 
selection function leads to a decrease of the completeness in the 
radial direction. Note, that the observer is located at (0,0,0) in our 
Cartesian coordinate system. We can see in panel (a) of Fig.[8]that 
the completeness rapidly reaches its maximum at around 110 Mpc 
distance from the observer and decreases at larger radii to values 
below 10%. In the next section we show how remarkably homoge- 
neous structures are recovered in our reconstruction, independent 
of the distance from the observer and despite the low completeness 
values at large distances. We confirmed with additional reconstruc- 
tions with larger volumes the same behavior for boxes up to side 
lengths of around 750 Mpc. For even larger volumes of 1 Gpc size, 
not shown here, however, the main sample becomes too sparse and 
only the large-scale structures are recovered. Including the three di- 
mensional completeness for the SDSS DR6 data (see section [3X3]) 
in Eq.[7]we obtain a mean galaxy density of about 0.05. 

5.2 Mapping the Sloan and the Cf A2 Great Wall 

The Sloan Great Wall is one of the largest structure known in our 
local Universe a lthough it is not a gravitationally bound object (see 
[Gott et al. 2005). It extends for abou0 400 Mpc (for a detailed 
study see Deng et al. 2006) and is located around 300 Mpc distant 
from Earth. In Fig. [7] we represent different radial shells, picking 
out the structures of the Sloan Great Wall, which extends from 
about 140° to 210° (-150° in Fig. [6]) in right ascension and ex- 
tends within a few degrees around declination 6 ~ 0°. In these 
shells other complex structures can be observed at higher declina- 
tions, showing filaments, voids and clusters of galaxies. Moreover, 
the region which has not been observed, lying outside the mask (see 
panel (a) in Fig. [6]) is predicted to be filled with structures by the re- 
construction method according to our assumed correlation function 
(see section [333] ). The Sloan Great Wall can also be seen in Fig. [8] 
almost in its full extent. We can see, how ARGO recovers the matter 
field, balancing the structures with low signal to noise ratio against 
those with a higher signal, leading to a homogeneously distributed 
field, meaning that clusters close to and far from the observer are 
both well represented. Only where the signal to noise drops below 
unity, do structures tend to blur, as can be observed in the upper 
parts of the reconstruction shown in Fig. [9] 

The Cf A2 Great Wall is also one of the largest structure known 
in our local Univ erse and contains the C oma Cluster (Abell 1656) 
at its center (see [Geller&Huc"h^[ 19891) . We can clearly see the 
Coma Cluster in the projected reconstruction without smoothing, 
being the big spot at right ascension a ^ 195° (-165° in Fig. [6]) 
and declination S ^ 28° in panel (b) of Fi g. [6l located at a dis- 
tance of ^ 100 M pc from the observer (see lThomsen et al.l[l997[ : 
[Carter et al.'^2008). The CfA2 Great Wall cannot be seen in its full 
extent in Fig.[8]because it reaches higher declination angles than se- 
lected in the plot. However, it can be partially seen as an elongated 
matter structure at about 100 Mpc distance to the observer, i.e. at 
around -100 Mpc in the X-axis in Fig. [8] Large filamentary struc- 
tures are present even after smoothing with a Gaussian kernel with 
a smoothing radius of rs =10 Mpc (see panel (d) in Fig. [8]). The 
second major cluster of the Coma super-cluster is the Leo Cluster 
(Abell 1367) at a distance ^ 94 Mpc (z ^ 0.022), with galac- 
tic coordinates a ^ 176° and S ^ 20°. It is weakly detected in 



Note, that the extension of the Sloan Great Wall is usually given in lu- 
minosity distance, which can be around 40 Mpc larger than in comoving 
distance as we represent it here. 
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Figure 6. Panel (a): completeness of the observed patches on the sky. Shown are projections on the sky of the three dimensional matter field reconstruction, 
including the deconvolution with a redshift distortions operator and divided by the number of line-of sight grid-points used for the calculation to obtain a 
mean density field on the sky: without smoothing (panel (b)), after a convolution with a Gaussian kernel with a smoothing radius of rs =5 Mpc (panel (c)) 
and rs =10 Mpc (panel (d)). Note, that the longitude angles -90°, -120°, -150° and -180° correspond to 270°, 240°, 210° and 180° right ascension angles, 
respectively, with the positive angles being equal. For a general right ascension angle a the longitude is calculated as: q;-360° for a ^ 180°. The latitude 
angles are identical to the declination angles. 



© 0000 RAS, MNRAS 000, 000-000 



Cosmic Cartography of the LSS with SDSS DR6 15 



c 1 1 n r* 1 1 1 c t fi" 

c» U UCl V^l US LCI 








/-vj rlpT'lincition A frli^cri'P'i^d 




Coma 


Coma 


A1656 


195 (-165 ) 


28° 


0.0231 


Coma 


Leo 


A1367 


176° (176°) 


20° 


0.0220 


Hercules 




A2040 


228° (-132°) 


7° 


0.0448 


Hercules 




A2052 


229° (-131°) 


7° 


0.0338 


Hercules 




A2063 


231° (-129°) 


9° 


0.0341 


Hercules 


Hercules 


A2151 


241° (-119°) 


18° 


0.0354 


Hercules 




A2147 


241° (-119°) 


16° 


0.0338 


Hercules 




A2152 


241 (-119 ) 


16° 


0.0398 


Hercules 




A2148 


241° (-119°) 


25° 


0.0418 


Hercules 




A2162 


243° (-117°) 


29° 


0.0310 


Hercules 




A2197 


247° (-113°) 


41° 


0.0296 


Hercules 




A2199 


247° (-113°) 


40° 


0.0287 



Table 1. Some of the most prominent clusters in the reconstruction with their corresponding right ascension and declination in degrees and redshift. Note, that 
the right ascension angle in Fig.[6lis indicated in parenthesis and can be calculated as: q;-360° for a ^ 180°. 



our reconstruction as can be seen in panel (b) of Fig. [6l since it is 
partially located in the major gap of DR6 and should be therefore 
better detected with DR7. 

The Hercules supercluster also belongs to the CfA2 Great 
Wall. Most of the clusters which belong this supercluster can be 
identified in the reconstructed area. Since the spatial range of these 
clusters is large, we have listed in table[T]the groups of clusters with 
their respective localisation in the sky which appear as especially 
prominent ove rdensity regions in the projected reconst ruction (for 
references see lAbell et alll 19891 : IStruble & Roodl[T999 ). Note that 
close-by structures such as the Virgo Cluster, which is at a dis- 
tance of only about 18 Mpc distance to us, cannot be detected in 
our reconstruction, because the lower limit of our sample is set at 
z = 0.01. 



5.3 Detection of a great void region 

The scorpion-\ikQ form of the matter distribution spanning the 
whole observed region in Fig.[9](see mask in panel (a)) shows large 
connected filamentary structures with many clusters. Interestingly, 
an extremely large void is spanned in the region with -150 Mpc 
< Y < 30 Mpc and 70 Mpc < Z < 220 Mpc (see panels (a), (b) 
and (c) in Fig. [9]). In order to evaluate the confidence of the de- 
tection one should check how deeply this region has been scanned 
by SDSS. By inspection of the three dimensional mask we confirm 
a fairly high completeness ranging from about 30% to about 65% 
(see panel (a) of Fig.|9]). The extension in the X-axis is still unclear, 
since the gap in the mask grows in the void region to larger dis- 
tances to the observer. ARGO predicts an extension of about -250 
Mpc < X < -450 Mpc. From our results, we can tell that it is one 
of the largest voids in the reconstructed volume, having a diameter 
of about 150 Mpc. Conclusive results can only be obtained after 
investigating DR7, which fills the main gaps. Since, in this case, 
a proper treatment of the DR7 mask is required and this mask was 
not public at the time this project started, we postpone this study for 



later work. The large overdensity region found in the unobserved 
region at about: -30 Mpc < Y <30 Mpc and 370 Mpc < Z < 430 
Mpc results from the correlation with a huge cluster region which 
extends in the range: -30 Mpc < Y < 30 Mpc and 350 Mpc < Z < 
450 Mpc and which can be best seen at about X -170 Mpc (see 
panels (e) anf (f) in Fig.|9]). 



5.4 Cluster prediction 

The signal- space representation of the Wiener-filter (see section 
13.3.11 1 enables us to deal with unobserved regions, i.e. cells with 
zero completeness. Note, that for those cells the noise term van- 
ishes in the Wiener-filter expression (Eq. (30]). The filter can then 
be regarded as a convolution with the non-diagonal autocorrela- 
tion matrix of the underlying signal propagating the information 
from the windowed region into the unobserved cells. This gives 
a prediction for the Large-Scale Structure in these regions. Such 
an extrapolation can be clearly seen in panels (b), (c) and (d) of 
Fig. [6] These show the projected three dimensional reconstruction 
on the sky without smoothing and after a convolution with a Gaus- 
sian with a smoothing radius rs of 5 and 10 Mpc, respectively. In 
these plots the gaps are hardly distinguishable, due to the signal 
prediction given by the Wiener-filter. We have chosen a slice, in 
which the propagation of the information through gaps can be an- 
alyzed. In panel (a) of Fig. [TO] we can see the three-dimensional 
mask through our selected slice. The main gap crosses the entire 
box through the Y-axis and reaches about 50 Mpc width in the Z- 
axis. Several other smaller gaps are distributed in the slice. In the 
reconstruction in panel (b) we can see how the main gap is partially 
filled with some diffuse overdensity structures which are produced 
precisely as described above. Panel (c) shows the same reconstruc- 
tion smoothed with a Gaussian kernel with a smoothing radius of 
rs =5 Mpc. Overplotted is the mask showing the regions in which 
it was observed. We identify seven clusters close to gaps extend- 
ing into unobserved regions at a slice around -265 Mpc < X < 
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Figure 7. Different radial slices around the Sloan Great Wall. Shown are projections of the three dimensional matter field reconstruction on the sky considering 
only cells with a comoving distance between 290 Mpc and 310 Mpc (panel (a)), 300 Mpc and 320 Mpc (panel (b)), 310 Mpc and 330 Mpc (panel (c)), and 320 
Mpc and 340 Mpc (panel (d)). Note, that the longitude angles -90°, -120°, -150° and -180° correspond to 270°, 240°, 210° and 180° right ascension angles, 
respectively, with the positive angles being equal. For a general right ascension angle ot the longitude is calculated as: q;-360° for ot ^ 180°. The latitude 
angles are identical to the declination angles. 
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Figure 8. Slices around the Sloan and the CfA2 Great Wall. Panel (a): slice through the three dimensional mask multipHed with the selection function at ~7 
Mpc in the Z-axis. Panels (b), (c), and (d) show sHces through the reconstruction after taking the mean over 20 neighboring sHces around the sHce at ~7 Mpc 
in the Z-axis, without smoothing, convolved with a Gaussian kernel with a smoothing radius of rs =5 Mpc and rs =10 Mpc, respectively. Note, that panel 
(b) represents log(l + 6), whereas panels (c), and (d) show S. 



-245 Mpc (see clusters ci-c? in Tab. [2]). In addition, there are some 
weaker detections (see clusters cs-cio in Tab. E]). The gap which 
cluster ci extends into, and the largest gap, are the ones in which 
more information propagation occurs. There is an especially inter- 
esting region in the main gap around -140 Mpc < Y < 30 Mpc in 
which the algorithm predicts a high chance to find overdense struc- 
tures. The rest of the gaps remains with low density values, since no 
prominent structures are in their vicinity. We investigate the public 
DR7 archive (see Section [2]) to check for overdense regions in the 
gap. Note, that without a full angular and radial selection function 
treatment a quantitative comparison is not possible. We restrict our 
study by gridding the galaxy sample with NGP, ignoring mask or 
selection function effects, and convolving it with a Gaussian kernel 
with a smoothing radius of rs =10 Mpc (see panel (d) in Fig.fTOl). 



Though, faint features like the filaments lying at around -230 Mpc 
< Y < -130 Mpc cannot be recovered, stronger features like the 
clusters located at -100 Mpc < Y < Mpc show that there is indeed 
an overdense region in the gap confirming our prediction based on 
DR6. In particular the extension of the clusters ci and C2 are very 
well predicted by our algorithm. Cluster cio is weakly predicted. 
The filament connecting clusters cs and cio is predicted by ARGO, 
perhaps by chance, but the resemblance in the gap of the recon- 
struction to the real underlying distribution shows that use of the 
correlation function of the LSS allows for plausible predictions. 
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Figure 9. Panel (a): slice through the three dimensional mask multiplied with the selection function at ~-109 Mpc in the X-axis. Panels (b), (c), and (d) show 
slices through the reconstruction after taking the mean over 20 neighboring slices around the slice at ~-109 Mpc in the X-axis, without smoothing, convolved 
with a Gaussian kernel with a smoothing radius of rs =5 Mpc and rs =10 Mpc, respectively. Panel (e): slice through the three dimensional mask multiplied 
with the selection function at ~-168 Mpc in the X-axis. Panel (f) shows a slice through the reconstruction after taking the mean over 20 neighboring slices 
around the slice at ~-168 Mpc in the X-axis without smoothing. Note, that panels (b) and (f) represent log(l + 5), whereas panels (c), and (d) show 6. 
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C7 
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C8 
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eg 


-110Mpc< Y<-90 Mpc 


70 Mpc < Z < 90 Mpc 


ClO 


-40 Mpc < Y < -60 Mpc 
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Table 2. Approximate positions of cluster candidates Ci (with i ranging from 1 to 10) at a slice around -265 Mpc < X < -245 Mpc in the reconstructed box 
which are located close to gaps (see FigfTot. 



5.5 Statistics of the density field 

From a physical point of view, one would expect a log-normal 
distribution of smoothed density for a certain range of smoothing 
scales, if one assumes an initial Gaussian velocity field and extrap- 
olates the continuity equation for the matter flow into the nonlinea r 
regime with linear velocity fluctuations fsee lColes & Joneslll99ll) . 
Since the log-normal field is not able to describe caustics, we expect 
this distribution to fail below a threshold smoothing scale. There 
should also be a transition at a certain scale between this quasi- 
linear regime and the linear regime where the matter field is still 
Gaussian distributed. Due to use of the Wiener-filter which consid- 
ers only the correlation function to reconstruct the density field and 
the Gaussian smoothing, we expect the density field to be closely 
Gaussian distributed in the unobserved regions. Here, we analyse 
the statistical distribution of the density field by counting the num- 
ber of cells at different densities with a density binning of 0.03 in 
(1 + ^m) at different scales, defined by convolving the reconstruc- 
tion with a Gaussian kernel with smoothing radii rs of: 10, 20, and 
30 Mpc. We performed the analysis for different radial shells in the 
ArEfl ranges: < r < 200 Mpc, 200 < r < 400 Mpc, r > 400 
Mpc, and < r < 600 Mpc, separating observed (w > 0) and 
unobserved (w = 0) regions (see Figs. [TT] and (TJ]). Note, that due 
to shot noise, we are missing power in the filtered reconstruction 
on small scales. Moreover, the discrete Fourier repre sentation of 
the signal implies negative densities fsee lJasche et al 2009). This 
obliges us to perform this statistical analysis on scales larger than 
the smallest grid scales. We can see this in the excess of low density 
cells for the dashed black curve (rs =5 Mpc). In addition to that, 
we are also limited by the size of the box, having less information 
as we go to larger and larger scales. This effect can be appreciated 
in the stronger deviation from the log-normal fit around the peak 
for the green line (rs =30 Mpc). For this reason, we restrict this 
analysis to the range of scales given above. The plots in Figs. [TT] 
and [T2I show how the distribution tends towards Gaussianity as we 
go to larger and larger scales. 

We calculated the skewness and kurtosis to quantify the devi- 
ation from Gaussianity. Let us define here the statistical quantities 
required for our analysis. The number of cells contained in a shell 



of radial range Ar is given by the sum of the number counts in each 
density bin 



^(Sus = fAr,i- 



(51) 



The mean overdensity in Ar which is very close to zero, is calcu- 
lated as: 



-| bins 

Ar — ArAr jAr,iOA 



(52) 



with the superscript standing for bin. These two previously de- 
fined quantities permitted us to calculate the central n-moments /in 
of the distribution with: 

M^Ar) = J2 f^r,^ (^A.,^ -WX- (53) 

^^cells i ^ ^ 

Note, that the variance is just the second moment: cr^ = /i2. Now, 
we can define the skewnes£3: 



and the kurtosia 



M3 



(54) 



(55) 



Let us also introduce Pearson's skewness defined as the mean 5^ 
minus the mode ^max(/) (overdensity bin with the maximum num- 
ber of counts max(/)) normalized by the square root of the vari- 
ance: 



sp(Ar^ 



_ ^Ar '^max(/B(Ar)) 



(56) 



cr(Ar) 

The results are shown in Figs.[TT]and[T2]demonstrating large devia- 
tions from Gaussianity in the observed regions and negligible devi- 
ations for the unobserved regions. Since the Wiener filter uses only 
the first two moments of the matter distribution, we do not expect 



Note, that we considered the density at the center of the bins. 



■^^ Note, that for a Gaussian distribution: s = 0. 

■^^ Note, that for a Gaussian distribution: /j,4/a^=3 and thereby: k = 0. 
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Figure 10. Panel (a): slice through the three dimensional mask multiplied with the selection function at ~-256 Mpc in the X-axis. Panels (b) and (c) show slices 
through the reconstruction after taking the mean over 20 neighboring slices around the slice at ~-256 Mpc in the X-axis, without smoothing and convolved 
with a Gaussian kernel with a smoothing radius of rs =5 Mpc, respectively. Panel (d): DR7 sample gridded with NGP and convolved with a Gaussian kernel 
with a smoothing radius of rs =5 Mpc. In panels (c) and (d) the DR6 mask is over-plotted. Note, that there is some correspondance between the structures 
predicted in the gap from the Sample dr 6f ix and the observed galaxy distribution there in DRV. Note, that panel (b) represents log(l + 5), whereas panels 
(c), and (d) show 5. 



large deviations from Gaussianity in the unobserved regions where 
there is almost no data constraining the result. Note, that in Figs.fTTI 
and[T2]the skewness andkurtosis are also given (skewness: sio, S20, 
S30, kurtosis: kio, k2o, kso, Pearson's skewness: spio, sp2o, spso, 
with the subscript denoting the smoothing radius in Mpc). Pear- 
son's skewness is always larger for the observed regions than for 
the unobserved regions after smoothing with rs =10 and rs =20 
]V[pc and all distributions show a positive skewness. The skewness 
and kurtosis values show that the matter distribution starts to be 
closely Gaussian distributed after smoothing with a radius rs of 30 
IMpc. Nevertheless, for the region 200 < r < 400 Mpc we find 



a large deviation from Gaussianty even at that scale. Large scale 
structures like the Sloan Great Wall can be responsible for this. 
Furthermore, we analyzed in great detail the matter distribution in 
the region < r < 600 Mpc which has better statistics. On the 
right panel of Fig. [12] we can see the statistics for the unobserved 
region. The dashed curves show the measured distributions at dif- 
ferent scales (black: rs =10 Mpc, red: rs =20 Mpc, green: rs =30 
Mpc). We calculated the means and the variances for each distribu- 
tion and plotted the corresponding Gaussian distributions with light 
dashed-dotted lines. 

On the left panel of Fig.Owe can see the statistics for the ob- 
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Figure 11. Statistical distribution of cells at different densities with a density binning of 0.03 in (1 
reconstructed matter field at different scales (rs: continuous: 10 Mpc, dashed: 20 Mpc, dotted: 30 Mpc) 
shells in the observed region (w > 0), and the lower panels show the same in the unobserved region 
kurtosis: kio, k2o, kso, and Pearson's skewness: spio, sp2o. ^P30 are also given. 



2.0 2.5 



Sm). The curves represent the distribution for the 
. The upper panels show the statistics at different radial 
(w = 0). The corresponding skewness: sio, S20, S30, 



0.30 p 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ~ i — I — I — r 



0<r<600 [Mpc] 




^10= 


1.6 


820= 


1.0 


^30= 


0.7 


1^10= 


4.3 


k2o= 


1.7 


Ko= 


0.9 




= 0.6 


Sp2o= 


= 0.6 


^P30- 


= 0.1 



T 



w=0 



0<r<600 [Mpc] 
S,n= 0.8 



1.0 1.5 2.0 2.5 3.0 0.0 

1 + 5 




B 



Figure 12. Statistical distribution of cells at different densities with a density binning of 0.03 in (1 + (5m). The dashed curves represent the distribution for the 
reconstructed matter field at different scales (rs: black: 10 Mpc, red: 20 Mpc, green: 30 Mpc). The corresponding skewness: sio, S20, S30, kurtosis: kio, k2o, 
kso, and Pearson's skewness: spio, sp2o. sp3o are also given. On the left: (observed region: w > 0) continuous lines: best fit lognormal distributions using 
a nonlinear least squares fit based on a gradient-expansion algorithm, dashed-dotted curves: Gaussian distributions for the measured means and variances. 
On the right: (unobserved region: w = 0) continuous lines: Gaussian distributions for the measured means and variances with the corresponding statistical 
correlation coefficients r2o , ^40 , and tqq . 
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served region with the dashed curves showing again the measured 
distributions at different scales (black: rs =10 Mpc, red: rs =20 
Mpc, green: rs =30 Mpc). We m odelled the distribution by a log- 
normal fsee lColes & Joneslll99lh and calculated the best fit using 
a nonlinear least squares fit based on a gradient-expansion algo- 
rithnEI. 

For that, we parameterized the log-normal distribution as: 
P{5r.\p) = ipg(i%^) exp [6(log(l +5^) - cf)] , (57) 

with p — [a, 6, c] being a set of parameters. The results of the best 
fits normalized with the number of cells are shown as the contin- 
uous lines on the left panel in Fig. [TJl One can appreciate in all 
curves for it; > small tails towards low densities and long tails 
towards high densities showing a clear deviation from Gaussianity. 
The measured distributions are well fitted by the log-normal distri- 
bution of smoothed density for smoothing radii rs of 10, 20, and 
30 Mpc. We also calculated the mean and the variance and plotted 
the corresponding Gaussian distributions with light dashed-dotted 
lines. We conclude therefore, that the distribution of the matter field 
is in good agreement with the log-normal distribution at least in the 
scale range from about 10 Mpc < rs < 30 Mpc. This result is espe- 
cially strong, since we did not assume a log-normal prior distribu- 
tion in the reconstruction method. From a frequentist approach the 
Wiener-filter just gives the least squares estimator without impos- 
ing any statistical distribution to the matter distribution. The picture 
from a Bayesian perspective is more precise: a Gaussian prior dis- 
tribution for the underlying density field is assumed. The posterior 
distribution, however, is conditioned on the data, which finally im- 
poses its statistical behavior onto the reconstruction, as can be seen 
in our results. 



6 CONCLUSIONS 

We have presented the first application of the ARGO computer code 
to observational data. In particular, we have performed a recon- 
struction of the density field based on data from S amp le dr6fix 
of the New York University Value Added Catalogue (NYU-VAGC) 
(see section [3. This yielded the largest Wiener-reconstruction of 
the Large-Scale Structure made to date requiring the effective in- 
version of a matrix with about 10^ x 10^ entries. The use of op- 
timi zed iterative inversion s chemes within an operator formalism 
(see lKitaura & EnBlinll2008h. together with a careful treatment of 
aliasing effects fsee lJasche et al.ll2009b permitted us to recover the 
field on a Mpc mesh with an effective resolution of the order of 
'-^10 Mpc. Furthermore, we have investigated in detail the statis- 
tical problem in particular the noise covariance employed for per- 
forming Wiener-reconstructions. 

We have demonstrated that Wiener-filtering leads to differ- 
ent results than those obtained by the commonly used method of 
inverse weighting the galaxies with the selection function. Both 
methods are comparable when the galaxy number counts per cell 
is high. However, in regions with sparse observed galaxy densities 
inverse weighting delivers very noisy reconstructions. This finding 
could have important consequences in power- spectrum estimation 
and galaxy biasing estimation on large scales. 

As part of the results the Sloan Great Wall has been presented 
in detail (see section [5^ and some other prominent structures like 
the Coma, the Leo, and the Hercules Cluster, have been discussed, 
as well as the detection of a large void region (see section 15.31 ). 

CURVEFITfromlDL 



Our results also show the detection of overdensity regions close to 
edges of the mask and predictions for structures in within gaps in 
the mask which compare well with the DR7 data in which the gaps 
are filled (see section [54l ). Finally, we have analyzed the statistical 
distribution of the density field finding a good agreement with the 
log-normal distribution for Gaussian smoothing with radii in the 
range 10 Mpc < rs < 30 Mpc. We hope that this work highlights 
the potential of Bayesian large-scale structure reconstructions for 
cosmology and is helpful in establishing them as a widely used 
technique. 
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